| CHALLENGED MYSELF 
TO DRAW LIKE AN Al, 
WHAT DID | LEARN? 


Quite a bit actually... 


Intro 


Hello, | am tekKUh, also known as TecMaster000 on Newgrounds, a 15 year old 
digital artist who likes to draw for fun, constantly fascinated by technology, 
especially of today. 


| havent been so fond of Al companies stealing artists data and calling 
it "Publicly Accessible Data’, but nether-the-less, the actual way these Image 
Generating Als work will never fail to amaze me. 


| mainly challenged myself to do this because | knew there had to be 
some flaw, despite the absolute astonishing ways all of it happens, the Al 
itself doesn’t have a voice, so | decided to be its voice. 


So how does Al Image Generation even work? 


(NOTE: This is just a suger simple overview, you are literally one search away from diving into the rabbit hole of A/ 
/mage Generator). 


If |was going to have any chance of drawing like the Al, | need to copy 
its method as best as | can. 


| began just searching about how it works, clicking wherever felt like the 
right way to go, a very useful video | learnt from would be Gonkee’s How 
Stable Diffusion Works on Youtube. 


But for those who just want it in short, here’s what happens... 


Step 1: Training 


Its obvious to anyone that Al doesn't actually make anything unique, Its 
just trained on a TON of images, but not just images, the Al needs to know 
what each image means. 


So in short, images get processed, with or without manual intervening, 
using clever tricks so it doesn't take years, attached to words and a ton of 
words that are similar or connected [this is why if you were to tell it to make a 
table, chances are It would also draw out a floor for the most part], 
essentially making a text to image Interpreter, like a translator almost! 


After a while of training, we get given a cool model thats basically a 
game card, just of processed data, ready to generate “new” images. 


But what does the Al actually do when it starts making an image now??? 


Step 2: Make noise 


Yes, the Al’s first step in making refined, pristine images is going full 
HowloBasic with the canvas, generating some amazing noise first to then de- 
noise into a nice image 


[because yes, thats also what it gets trained on, denoising garbage to THEN 
make the image, like polishing, thats essentially what It was trained to do in 
the end, It makes sense now!] 


Step 3: Polish it 


After it feels content with its mathmatically accurate noise, It starts to 
use its data to: 


« Connect the dots 


o It has an entire word bank, so using your prompt it can then connect 
words up, filling in the gaps and highlighting apparently “important” 
factors, before the amount of fingers weren't considered, but now we 
have made text really well with Al Image Generation now! 


¢ Polish your image slowly as many times as it has been told 
This is the main part of the process, so lets explain this a bit further. 


With the very cool noise, which is just colours splattered everywhere, every 
pixel being insanely different, It does an inference. 


..and whats an inference? 


Think of inference like a set of rough strokes, modifying the image so it is 
closer or following the promot, It is still literally like shoe polishing. 


[Again, VERY simplified, and you are one search away from finding answers that will possibly break your neurons] 


So it just overlays it a bit with a thin layer of... pixels, from its trained data. 


It then repeats this one step for how many Inferences it has been told to do, 
Its a computer after all, how would It decide that? 


This is the part where | try simulate this technique 
If | can even call it one... 


Nether-the-less, | picked up my stylus, and went to copy this exact 
technique.. So HERE GOES!!! 


Step 1: Training 


| don't feel the urge to stare at word banks and over 100,000,000 images, 
that would just ruin my day, and fortunately, | already have some sort of 
experience as a human being who has learnt a language, and how to put 
them Into visual forms, just like the Al did. 


| think | can safely pass this ste... 
Step 2: Make noise 


This is possibly an original experience, but the first step of my artistic 
erocess was 4+ adding noise >. pups st oie PT Tbs 


| was able to do this quite easily 
via the noise effect. 


[Me adding noise to a 148x148 
canvas, | think it would’ve taken 
too long with a high pixel count 

canvas overall]. 


Step 3: Polish it 


So we have met our most important process, how was | going to 
immitate this?? 


VERY luckily for me, | have seen Stable Diffusions Inpainting feature, It 
essentially lets you see the Al paint it out, and It does it from noise! 


You can see a quick demo of this feature here 


Pretty cool isn’t it? Different models sometimes even have different ways 
of doing this somehow, which | can’t explain. [without inpainting, but they can 
be figured out roughly by generating by a single inference, two inferences 
etc-, then putting the pieces together. Inpainting & straight up 
algorithmatically can produce different results and show different ways of 
doing things.] 


So to my previous views of seeing each inference generated onto of 
one another, | went with the approach of using an airbrush, setting it toa 
massive size, a low opacity, and just overlaying it, setting the brush slightly 
smaller with every layer of paint. 


Now with this technique, | will be drawing a character portrait, Its just a 
character with a circular face, big eyes, and eyelashes, small nose, yada yada 
yada. 


So after my 1* “Inference’, | came out with this: 


Using an airbrush, | just overlayed it, pretty rough, 
pretty okay so far, |am a human after all so | never 
faced the problem of accidently making weird features, 
Im just goated with my model. 


And | just continued 


And continued.... 


Final Result >>>> 


And decided it was fine to stop. 


So that was a very interesting experience, | found out that | had many 
advantages over Al in which: 


« [can stoe random artefacts 


* Im notin a dream [where things just mutate randomly] 


There’s this weird thing about Al Image Generation specifically where It 
doesn't keep context sometimes, unfortunately | didn’t want to do that 
because | actually have a soul that doesn’t want to ruin something like this, 
and | also had a lot of training to what IS and what ISN'T useful. 


So what's my conclusion after all of this? 


Before | make my conclusion for Improvements, | actually want to use 
this as an opportunity to make a few points about Al:: 


* Al lmage Generation is a good tool. 


° Obviously Al Image Generation isn’t perfect, will never be. It’s a tool, 
think of my analogy of a “translator”, that is essentially what it is, a 
text to image translator, trained to text and translations in image. 


If you think about translation bots, have they replaced translators? 
No, they are simply a tool, Al Image Generation should never bea 
replacement unless its necessary and you don't have the skills to 
translate your words to another language, or art. 


¢« Dont attack developers. 


o This should be obvious to anyone, but the developers working on 
this are amazing people, and for the most part, voluntary, they are 
only doing this to bring a neat tool to us, using images non- 
commercially for research the only people you should blame for the 


theft is the big co-orporations accessing “Publically availiable 
information” [Aka. Stolen work] and feeding them into these Al 


algorithms for commercial purposes, which can be stated to be 
Illegal in most if not all cases. 


With that rant out of the way, lets actually conclude this: 


¢ Al doesn't know yet when to stop, neither does it ask during its process. 


o This would be a very tedious and power hungry task, but Al can be 
made more intelligent, by reviewing itself after each inference. 


Or actually better yet, to ask the user, could you imagine if after x 
amount of inferences, you were asked what to tweak? and it could 
add it to the prompt and continue? This could in my opinion, 
reduce power intensitivity as Al becomes more optimised and as It 
reduces the amount of times people generate images, heavilly 
saving power and even helping world in doing so as the amount of 
times Al starts generating new images are reduced! Giving quality 
over quantity 


¢ Al model developers should work with Artists to make Al Optimised 
inputs for training 


o Even if scraped data is used [which is fine for non-commercial, not. 
commercial uses], some amount of good art that doesn't include 
weird artefacts given as contributions from Artists can heavily 
improve the Al, and define for the Al what is actually good, and what 
is actually bad, would help the Al and in return help the artists using 
them as a tool, giving them a clearer image to take reference from. 


There are other minor things like imperfections and obviously just that It 
can't do some things, but | believe they are already being worked on, or have 
been made as a patch for Stable Diffusion. 


Al Image Generation is an amazing tool, and amazing as a little help for 
artists for brainstorming ideas, but the biggest problem is its marketing that 
its going to replace artists, and be an artist itself, when it actually shouldn't, 
just like how translators haven't been hugely affected by machine translation, 
which is a lot more simple and better than ever today. 
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Credits: 


tekKUh - Writer and responsible for the project 


Honourable Mentions: 


DoAvyGirl-A cool supportive person 


Ibis Inc. - Their amazing art app “Ibis Paint X” was what | used to make 
my drawing! 


You -For reading this till the end and potentially showing your support 
for me and OPWriters Garage!!! 


